这项工作介绍了一个新颖的知识蒸馏框架,用于分类任务,其中可用并考虑到现有子类信息。在具有少量类或二进制检测的分类任务中,从教师到学生的信息量受到限制,从而限制了知识蒸馏的效用。通过利用类中可能的子类信息可以提高性能。为此,我们提出了所谓的子类知识蒸馏(SKD),这是将预测子类知识从老师转移到较小学生的过程。在老师的课堂逻辑中不存在的有意义的信息,而是在子类徽标中存在(例如,课堂内的相似之处)将通过SKD传达给学生,然后将提高学生的表现。从分析上,我们衡量教师可以通过SKD向学生提供多少额外信息,以证明我们工作的功效。开发的框架是在临床应用中评估的,即结直肠息肉分类。这是两个类别和每个类的许多子类的实际问题。在此应用程序中,使用临床医生提供的注释来根据注释标签的学习方式来定义子类。接受SKD框架训练的轻巧,低复杂的学生的F1得分为85.05%,提高了1.47%,比学生分别接受和没有常规知识蒸馏的学生获得了2.10%的收益。接受和没有SKD的学生之间的2.10%的F1得分差距可以通过额外的子类知识来解释,即,每个样本的额外的0.4656标签位可以在我们的实验中转移。
translated by 谷歌翻译
这项工作介绍了一个新颖的知识蒸馏框架,用于分类任务,其中可用并考虑到现有子类信息。在具有少数类或二进制检测的分类任务(两个类)中,从教师到学生网络传递的信息量受到限制,从而限制了知识蒸馏的效用。可以通过利用有关分类任务中可用类中可能子类的信息来提高性能。为此,我们提出了所谓的子类知识蒸馏(SKD)框架,这是将子类的预测知识从大型教师模型转移到较小的学生的过程。通过SKD,其他有意义的信息不在教师的课堂逻辑中,而是在子类中存在(例如,课堂内的相似之处)将被传达给学生并提高其表现。从数学上讲,我们测量老师可以通过SKD框架为学生提供多少额外信息。开发的框架是在临床应用中评估的,即结直肠息肉分类。在此应用程序中,临床医生提供的注释用于根据注释标签的学习方式来定义子类。接受拟议框架培训的轻巧,低复杂性学生的F1得分为85.05%,比在没有常规知识蒸馏的情况下训练的学生分别提高了2.14%和1.49%的增长。这些结果表明,额外的子类知识(即我们实验中的培训样本0.4656标签位)可以提供有关教师概括的更多信息,因此SKD可以从使用更多信息中受益于提高学生的表现。
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively.
translated by 谷歌翻译
In recent years, social media has been widely explored as a potential source of communication and information in disasters and emergency situations. Several interesting works and case studies of disaster analytics exploring different aspects of natural disasters have been already conducted. Along with the great potential, disaster analytics comes with several challenges mainly due to the nature of social media content. In this paper, we explore one such challenge and propose a text classification framework to deal with Twitter noisy data. More specifically, we employed several transformers both individually and in combination, so as to differentiate between relevant and non-relevant Twitter posts, achieving the highest F1-score of 0.87.
translated by 谷歌翻译
Osteoarthritis (OA) is the most prevalent chronic joint disease worldwide, where knee OA takes more than 80% of commonly affected joints. Knee OA is not a curable disease yet, and it affects large columns of patients, making it costly to patients and healthcare systems. Etiology, diagnosis, and treatment of knee OA might be argued by variability in its clinical and physical manifestations. Although knee OA carries a list of well-known terminology aiming to standardize the nomenclature of the diagnosis, prognosis, treatment, and clinical outcomes of the chronic joint disease, in practice there is a wide range of terminology associated with knee OA across different data sources, including but not limited to biomedical literature, clinical notes, healthcare literacy, and health-related social media. Among these data sources, the scientific articles published in the biomedical literature usually make a principled pipeline to study disease. Rapid yet, accurate text mining on large-scale scientific literature may discover novel knowledge and terminology to better understand knee OA and to improve the quality of knee OA diagnosis, prevention, and treatment. The present works aim to utilize artificial neural network strategies to automatically extract vocabularies associated with knee OA diseases. Our finding indicates the feasibility of developing word embedding neural networks for autonomous keyword extraction and abstraction of knee OA.
translated by 谷歌翻译
Neural models that do not rely on pre-training have excelled in the keyphrase generation task with large annotated datasets. Meanwhile, new approaches have incorporated pre-trained language models (PLMs) for their data efficiency. However, there lacks a systematic study of how the two types of approaches compare and how different design choices can affect the performance of PLM-based models. To fill in this knowledge gap and facilitate a more informed use of PLMs for keyphrase extraction and keyphrase generation, we present an in-depth empirical study. Formulating keyphrase extraction as sequence labeling and keyphrase generation as sequence-to-sequence generation, we perform extensive experiments in three domains. After showing that PLMs have competitive high-resource performance and state-of-the-art low-resource performance, we investigate important design choices including in-domain PLMs, PLMs with different pre-training objectives, using PLMs with a parameter budget, and different formulations for present keyphrases. Further results show that (1) in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models; (2) with a fixed parameter budget, prioritizing model depth over width and allocating more layers in the encoder leads to better encoder-decoder models; and (3) introducing four in-domain PLMs, we achieve a competitive performance in the news domain and the state-of-the-art performance in the scientific domain.
translated by 谷歌翻译
Privacy policies provide individuals with information about their rights and how their personal information is handled. Natural language understanding (NLU) technologies can support individuals and practitioners to understand better privacy practices described in lengthy and complex documents. However, existing efforts that use NLU technologies are limited by processing the language in a way exclusive to a single task focusing on certain privacy practices. To this end, we introduce the Privacy Policy Language Understanding Evaluation (PLUE) benchmark, a multi-task benchmark for evaluating the privacy policy language understanding across various tasks. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific pre-training offers performance improvements across all tasks. We release the benchmark to encourage future research in this domain.
translated by 谷歌翻译
While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within the same project, i.e., cross-file context, a critical source of information that is especially useful in modern modular software development. Such overlooking constrains code language models' capacity in code completion, leading to unexpected behaviors such as generating hallucinated class member functions or function calls with unexpected arguments. In this work, we develop a cross-file context finder tool, CCFINDER, that effectively locates and retrieves the most relevant cross-file context. We propose CoCoMIC, a framework that incorporates cross-file context to learn the in-file and cross-file context jointly on top of pretrained code LMs. CoCoMIC successfully improves the existing code LM with a 19.30% relative increase in exact match and a 15.41% relative increase in identifier matching for code completion when the cross-file context is provided.
translated by 谷歌翻译
Deep learning can extract rich data representations if provided sufficient quantities of labeled training data. For many tasks however, annotating data has significant costs in terms of time and money, owing to the high standards of subject matter expertise required, for example in medical and geophysical image interpretation tasks. Active Learning can identify the most informative training examples for the interpreter to train, leading to higher efficiency. We propose an Active learning method based on jointly learning representations for supervised and unsupervised tasks. The learned manifold structure is later utilized to identify informative training samples most dissimilar from the learned manifold from the error profiles on the unsupervised task. We verify the efficiency of the proposed method on a seismic facies segmentation dataset from the Netherlands F3 block survey, significantly outperforming contemporary methods to achieve the highest mean Intersection-Over-Union value of 0.773.
translated by 谷歌翻译